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Big Data Growth Drivers 





Big Data Growth Drivers 



edureka! 


Data Generated 
Every 60 Seconds 




Users upload 



of new video 


rNFICATIGN TRAINING www.edureka.co/big-data-and-hadoop 

































Global Mobile Data Traffic, 2015 to 2020 


edureka! 


Exabytes 
per Month 


r i 

i Cisco Forecasts 30.6 Exabytes per Month of Mobile Data Traffic by 2020 i 
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3 major trends contributing to the growth of mobile data traffic: 

> Adapting to Smarter Mobile Devices 

> Defining Cell Network Advances—2G, 3G, and 4G (5G Perspectives) 

> Reviewing Tiered Pricing—Unlimited Data and Shared Plans 


Sou nee: http ://www.t isco .co m/c/e n/us/sa I ut in ns/co Hate ra \fse rvice-p rovider/ vis uai-netwo rking-i ndes-vn i/mo b i le-wh ite- pape r-c 11-52035 2.html 


EDUREKA HADOOP CERTIFICATION TRAINING 


www.edureka.co/big-data-and-hadoop 










What is Biq Data? 



What is Big Data? 


edureka! 


i 

i 

i 

i 


■S. 


"Big data is the term for a collection of data sets so large and complex that it becomes difficult to process 
using on-hand database management tools or traditional data processing applications" 


\ 


i 

i 

p 


Volume I Variety ■ Velocity ■ Value ■ Veracity 



Processing increasing 
huge data sets 


Processing different 
types of data 


Data is being 
generated at an 

alarming rate 


Finding correct 
meaning out of the 
data 


Uncertainty and 
inconsistencies in the 

data 


EDUREKA HADOOP CERTIFICATION TRAINING 


www.edureka.co/big-data-and-hadoop 

















Let us understand Problems with Big 
Data and Traditional System with a 




Story of Big Data & Traditional System edureka! 


1 Scenario: , 

[ Bob has opened a small restaurant in his city i 
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Traditional Scenario 


f Traditional Scenario: 

2 orders per hour 

I 


N 


I 

! 

I 

J 
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f Traditional Scenario: 

Data is generated at a steady rate and is structured in 
nature 



Traditional Processing i 
System 


RDBMS 
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Failure of Traditional System 


f Scenario 2: 

i 

J > They started taking Online orders 
! s > 10 orders per hour 



Single Cook | 

(RegularComputing System) i 



Food Shelf i 

(Data) 
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f Big Data Scenario: 

Heterogenous data is being generated at an alarming rate 



Traditional Processing i 


System 


— “--—> — -i 

I RDBMS J 
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Issue 1: Too Many Orders Per Hour 
Solution: Hiring Multiple Cook 



Need of an Effective Solution 
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i Scenario: 

! Multiple Cook cooking food 


[ Issue: 

! Food Shelf becomes the BOTTLENECK 
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Need of an Effective Solution 



Data Warehouse 







EDUREKAHADOOPCER' 


edureka! 




t 

i 
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Scenario: 

Multiple Processing Unit for data processing 



i 

I 

i 

i 

i 


i Issue: 

\ Bringing data to processing generated lots 
| of Network overhead 
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Issue 2: Food Shelf becomes the Bottleneck 
Solution: Distributed and Parallel Approach 



Effective Solution 


edureka! 


t 

1 



1 

1 

u-a 

i 

1 

LF U 

1 

1 


u -a 


* 


m 


¥ 



\ 

\ 


Cooks Meat 


Cooks Sauce 


Assembles to 
cook Meat Sauce 


Distributed ( 
Food Shelf 

/ 
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Need of a Framework 
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Apache Hadoop: 
Framework to Process Big Data 



Apache Hadoop: Framework to Process Big Data edureka! 


f“' “““ -------------------- ------------ “ , 

Hadoop is a framework that allows us to store and process large data sets in parallel and distributed fashion 

__________________________________________________ 
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Hadoop: Master/Slave Architecture 



Hadoop: Master/Slave Architecture 


edureka! 


/ " "'v 

Scenario: 

A project Manager managing a team of four 
employees. He assigns project to each of 
them and tracks the progress 




T 


Project 

Manager 


O 



James 


O 



Bob 
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Hadoop: Master/Slave Architecture 


edureka! 
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* 
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Hadoop: Masfer/Slave Architecture 
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Hadoop: Master/Slave Architecture 


edureka! 
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Hadoop: Master/Slave Architecture 


edureka! 




SLAVE NODES 
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HDFS Core Components: 

01 02 03 



NameNode & DafaNode 





Secondary 

NameNode 


DataNode 


Data Node 


DataNode 
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/ 

/ 




\ 


NameNode: ' 

> Maintains and Manages DataNodes 

> Records metadata Le. information about data blocks e.g. 
location of blocks stored, the size of the files, permissions, 
hierarchy, etc. 

> Receives heartbeat and block report from all the DataNodes 


DataN ode: 

> Slave daemons 

> Stores actual data 

> Serves read and write requests from the clients 
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HDFS Core Components: 



NameNode 



Data Node 


t J 


Secondary 

NameNode 



Secondary NameNode & Checkpointing 



Checkpointing is a process of combining 
edit logs with Fslmage 
Secondary NameNode takes over the 
responsibility of checkpointing., therefore, 
making NameNode more available 
Allows faster Failover as it prevents edit 
logs from getting too huge 
Checkpointing happens periodically 
(default: 1 hour) 


Temporary 
During checkpoint 


NameNode 


ed itlog 


editlog 

(new) 




i 



fslmage 

l 

First time copy 

9 

1 

1 

fslmage 

l \ 

1 

i 



li 

i 

f 


Secondary 

NameNode 


editlog 


Fslmage 

(final) 
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How the data is actually stored 

in DataNodes? 

HDFS Data Blocks 



HDFS Data Blocks 


edureka! 


/ \ 

1 > Each file is stored on HDFS as blocks 5 

i i 

i > The default size of each block is 12S MB in Apache Hadoop 2x (64 MB in Apache Hadoop Lx) 1 

\ , / 
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Fault Tolerance: How Hadoop cope up 

with DataNode Failure? 



Fault Tolerance 


| Scenario: i 

One of the DataNodes crashed containing the data 
blocks J 
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NameNode 
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DataNode H DataNode ■ Daf ode 
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Fault Tolerance: Replication Factor 


MarrieNode 


/ v ^ 


^ / 
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DataNode 
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Dat ode 


DataNode 
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edureka! 


^ Solution: 

i 

J Each data blocks are replicated (thrice by default) and are 
i distributed across different DataNodes 


TRAINING 


www.edureka.co/big-data-and-hadoop 




















Fault Tolerance: Replication Factor 


MarrieNode 


/ v ^ 


^ / 


/ X 


\ V 
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: 1 

DataNode 


i 

DataNode 


• 

Dat ode 


i 

DataNode 
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edureka! 


^ Solution: 

! 


Each data blocks are replicated (thrice by default) and are 
l distributed across different DataNodes 



f - - - - - ■ - ► ~ - 1 - - x 

As it is said Never Put All Your Eggs in the Same Basket 
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HDFS Write Mechanism 



edureka! 


HDFS Write Mechanism - Pipeline Setup 

Setting up HDFS - Write Pipeline 


BLK 

A 


mui 

© 





NameNode 


Switch 


Rack 7 
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edureka! 


HDFS Write Mechanism - Writing a Block 

HDFS - Write Pipeline 


Client 


@ 


BLK A - Replica 1 
Write Req uest 

^- ~P - 


Switch 


DataNode 1 


Write Request 
on DN1DN4, DN6 


I 


Core Switch 



BLK A - Replica 2 
Write Request 




Switch 




Rack 1 








Rack 5 


Rack? 
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HDFS Write Mechanism - Acknowledgement 

Acknowledgement in HDFS - Write 



r 

HDFS 

Client 

L A 

i 

Cl Sent JVM 


© 


Write Successful 


Client ISIode 


At k from 
DN1.DN4 ft 




• - 


Core Switch 


© 





AC K from 
DM4 & DM6 


Switch 


DataNode 1 


f ELK A 

^ Copied 


Rack 1 



NameNode 


/ 


METADATA 

UPDATED 



Copied J 


Rack 5 



edureka! 
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HDFS Multi-Block Write Mechanism 


edureka! 


HDFS Multi - Block Write Pipeline 


6B 



0 



example.txt 


Client 


Switch 


Data Node 1 


0 ata Node 2 


DataNode 3 


Rack 1 | 

8LK 

e 



1A 


1 r i r 


1 GB File = 3 GB Storage in HDFS 
3 GB Network Traffic 


(IB'S 


Core Switch 


2B 



For Block A: 1A -> 2A -> 3A -> 4A 

For Block B: IB -> 2B *> 3B -> 4B -> SB -> 6B 


Switch 


DataMode? 


DataNcdei 


DataMade 9 




,:Ib; 
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HDFS Read Mechanism 



HDFS Read Mechanism 


edureka! 




r 

HDFS 

Client 

1 

Client JVM 


HDFS - Read Architecture 

1 


Read Request - Block A & B 


Client Node 


Read From 
DIVI1 & DN3 


IP Addresses; 
dni & Dm 


Core Switch 



Switch 

Datafdode4 

DataNode 5 




NameNode 


Rack 1 



Switch 

DcitaNcde? 

DataNcde S 

Da la Node 9 


Rack 7 
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HADOOP CORE COMPONENTS 


















Let us understand 
MapReduce with a story 



Story of MapReduce 


edureka! 



■ i 

! Each student has to count the occurrence of the word 

1 i 

i Julius in the book ■ 
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Story of MapReduce 



m 


Map 




Ch, I 




Ch. 3 



! Each student count the i 
1 number of occurrence in 


l 

l each chapter parallelly 
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edureka! 




t - - - - — ^ 

Reduce 1 
1 \ 


Prof will sum up the 
answer given by student to 
\ get the final output 
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What is MapReduce? 




What is MapReduce? 


edureka! 


t 

i 

i 

i 

v 


MapReduce is a programming framework that allows us to perform distributed and parallel 

processing on large data sets in a distributed environment 


\ 

i 

[ 

i 



edureka! 


Output 
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MapReduce 
Word Count Program 



MapReduce Word Count Program 


edureka! 


The Overall MapReduce Word Count Process 


edureka! 


Input 


Deer Bear River 
Car Car River 
Deer Car Bear 


Splitting 

K1,V1 


Deer Bear River 


Car Car River 


Mapping 
Ust(K2 P V2) 


Deer, t 
Bear, 1 
River, 1 


M 

r 'j 

Deer, 1 i 
Car, 1 

r 

Deer Car Bear 

k _ j 



Bear, 1 

L___ a 


Shuffling 

K2, List(V2) 


Reducing 



Bear, (1,1) 

L 

H 

ri 

Bear, 2 

L A 




Car, (1,1,1) 

k j 

H 

Car, 3 

L _ A 

r 




Deer, 2 

L A 




River, (1,1) 

_ u 

H 

River, 2 

k. _ u 


Final Result 


List(K3,V3) 



Bear, 2 
Car, 3 
Deer, 2 
River, 2 
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MapReduce Word Count Program 


edureka! 


(Three Major Parts of MapReduce Program: i 



Mapper Code: 

You write the mapper logic over here i.e. how map task will process 
the data to produce the key-value pair to be aggregated 

Reducer Code: 

You write reducer logic here which combines the intermediate key-value 
pair generated by Mapper to give the final aggregated output 

Driver Code 

You specify all the job configurations over here like job name. 

Input path, output path, etc. 
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Mapper Code 


edureka! 


ThkI Pi Im 


Kev Value 

fl "jp^l r-.i1 ■ 

12 1 ^-s - Car ^'hct 

2\i(j ZfzcrCzr Eaar 


•** Byte Offset Type 

Mapper Value Input Type 


Mapper Key Output Type 


Mapper \^lue Output Type , 


public static class Map extends Mapper<Longbritable,Text,Text,IntWritable> { 

public void nap(LongWritable key, Text value, Context context) throws IQExeeption,Interrupted£xception { 


String line = value.toStrin.g(); 

StringTokenirer tokenizer - new StringTokenizer ( line); 
while (tokerizer.hasMoreTokens()) { 
value.setftokenizer.nextToken()); 
context.write(value, new IntWritable(1) ) ; 



t 

I 

I 

I 

I 

i 

I 

\ 


\ 

Mapper Input: \ 

\ 

> The key is nothing but the offset of each fine in the text file: \ 

V 

LongWritable I 

> The value is each individual: Text 



[ Mapper Output: 

> The key is the tokenized words: Text 
t 

f > We have the hardcoded value in our case which is 1: IntWritable 
i 

l > Example - Dear 1, Bear 1, etc. 

\ 



EDUREKA HADOOP CERTIFICATION TRAINING 


www.eclureka.co/big-data-and-hadoop 

























Reducer Code 


edureka! 


i Reducer Key Input Type 

-** Reducer Value Input Type \ 

-►[_ Reducer Key Output Type 


Reducer Value Output Type 


public: static class Reduce extends Rtducer*<Ttxt„ I lUKr itable , Te ■ 1 1 IntWritable > { 

public void* reduce(Text key, Iterable*IntWritable> values,Context context) 
throws lOExceptionjlnterroptedlException { 

int sum=9; 

for(IntWritable x; valuesl 
{ 

suin+-x.get() i 

} 

context.writ*(key, new Inturitable(sua)); 

>}. ____ 


f 
I 

i 
1 


Reducer Input: 

Keys are unique words which have been generated after the sorting 


I and shuffling phase: Text 

> The value is a list of integers corresponding to each key: IntWritable 


l J Reducer Output: 

l > The key is all the unique words present in the input text file: Text 
I 

I > The value is the number of occurrences of each of the unique words: 


i 


Example: Bear, [1, 1], etc. 


I 

I 

I 

l 

\ 


IntWritable 

> Example: Bear, 2; Car, 5, etc. . 
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Driver Code 


edureka! 


'---'---i 

In the driver class, we set the configuration of our MapReduce job to run in Hadoop i 

i_._________ _ _, _ _ ^ ___ _ _ j 


Configuration conf= new Configuration(); 

]ob job = new 3ob{confj "My Word Count Program"); 
job.setlarBytlass(Wordtount.class); 
job.setMapperClass{Kap.class); 
j ob.setReducerClass(Reduce.class); 
job.setOutputKeytlass(Text.class); 

job * setOutputValueClass(IntWritable.class); 
job.setInputFormatClass{TextInputForrnat.class); 
job .setOutputForrnatClass(TextOutputFormat. class); 

Path outputPath = new Path{args[1]); 

//Configuring the input/output path from the filesystem into the job 

FiielnputFormat>add!nputPath(job, new Path(args[t : ])); 
FileGutputFormat<setOutputPath(job* new Path(args[l])); 


l 

r 

> Specify the name of the job , the data type of g 

i 

input/output of the mapper and reducer i 

> Specify the names of the mapper and | 
reducer classes. 

> Path of the input and output folder 

> The method setlnputFormatCIass (} is used 1 
for specifying the unit of work for mapper 

> MainQ method is the entry point for the 1 

driver i 
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YARN Components 



YARN Components 


edureka! 


Resource Manager: 

> Master daemon that manages all other 
daemons Si accepts job submission 

> Allocates first container for the AppMaster 


AppM aster: 

> One per application 

> Coordinates and manages MR Jobs 

> N e g ot i ate s resources from RM 


Resource 

Manager 



NodeManager: 

> Responsible for containers, monitoring their 
resource usage i.e. (cpu, memory, disk, 
network) Si reports the same to RM 


Container: 

^ Allocates certain amount of resources 
(memory, CPU etc.) on a slave node (NM) 
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MapReduce Job Workflow 



MapReduce Job Workflow 


edureka! 



2. Submit job 

3. Get application ID 



Resource Ma n ager 

JVM i 

* 

RM Node 




41 Start container 



' 

\ NodeManager J 

Allocate 

r 

i i 

i -- _ .. i 

s' Resources 

s 

4.2 Launch ! 

/■ 

>■ 

AppMaster \ 

/ 

s 

_ J- _ ' 


■ 

6.1 Start container 

i AppMaster JVM ■ 

■ 

i i 



i NodeManager 


i__ 


6.2 launch 


task JVM 


YARN child 


7, runi 

* 

MR Task 
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edureka! 


MAPREDUCE JOB WORKFLOW 


partition, sort 
and spill to disc 





Other Maps 
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YARN Architecture 



YARN Architecture 


edureka! 





S 


L 


I L 


J 


Node Status 

- Resource Request 

— ► MapReduce Status 
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Hadoop Architecture: HDFS & YARN 



Hadoop Architecture: HDFS & YARN 


edureka! 


Secondary 
Name Node 


r - ' 

NameNode 

k__ j 


* 




V 
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A 



Resource Manager 
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Hadoop Cluster 



Hadoop Cluster 


edureka! 


switch 


i 


NarmeNode 

r 


Secondary 

8 

NarmeNode 

r 

i 


Slave Nodes 

f 

i 


Slave Nodes 

f 

Rack 1 


core switch 


^ ~ 


switch 


I 


Slave Nodes 


* 


Slave Nodes 


w 

i 

1 

Slave Nodes 



f 

Slave Nodes 


Rack 2 


- - _ 

pmS 

| switch | 

i 

II 

1 Slave Nodes 


11 

1 Slave Nodes 


Bl 

1 Slave Nodes 



Slave Nodes 


Rack 3 
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Hadoop Cluster Modes 



Hadoop Cluster Modes 


edureka! 


Standalone (or Local) Mode 


> No daemons, everything runs in a single JVM 

> Suitable for running MapReduce programs during development 

> Has no DFS or Distributed File System 


Pseudo Distributed Mode 


> All Hadoop daemons run on the local machine 


Multi-Node Cluster Mode 


> Hadoop daemons run on a cluster of machines 
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Hadoop Ecosystem 



Hadoop Ecosystem 


edureka! 
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(FrwHilq u»«nj 


EMua Ficiiw Fnfirtfji 


jy| 

Spark: 


Resource 

MarMge-menl 


Storage 


A Flume 1 
DDO 


YARN 




Umlrucluired^ 
Semi tlnutturfd Dal* 


HIV( ft DRILL 
(Aiiatyi fall 

I SQl an Hadoop) 

MAHOUT ft 
Spark mliiu 

1 M n i liHir Irnr lni»£| 

PIG 

CScripdni) 

HflASE 

f* ^ 

% D RILL 

^SElS25 

Aj 

ft P i C Hi | 

hiSRSE 

AA 1 KA A, S10RU 
iStreamifijji 

Sf>a$ 

MUib 

SOtR A LUCENf 

jSed.ri.hmg 

ft Lfwk-^Lngl 

0021E 

ijSrhrduling) 

fe ^ 

O STORM 

Solr ? | 

|oo^a 


Sqoop 

zQ 


A AM BARI 
(Managt-mmi 

A Fbardirut^ umh 



\ Apnthie 

Ambari 


Structured Data 


EDUREKA HADOOP CERTIFICATION TRAINING 


www.edureka.co/big-data-and-hadoop 

















































Hadoop Use Case: 
Analyzing Olympic Dataset 



Hadoop Use Case: Analyzing Olympic Dataset 


edureka! 


Problem statement: 

> Find the list of top 10 countries won the highest medals 

> Find the total number of gold medals won by each country 

> Which countries have won the most number of medals in swimming? 


i 

j 



EDUREKA HADOOP CERTIFICATION TRAINING 


www.edureka.co/big-data-and-hadoop 









edureka! 


i-- " ■ -- ■ ■ ■ ■— ~ — -- 

i The data set consists of the following fields: 

i 

> > Athlete: This field consists of the athlete name 

i 

1 > Age: This field consists of athlete ages 

> Country: This fields consists of the country names which participated 
in Olympics 

> Year: This field consists of the year 

> Closing Date: This field consists of the closing date of ceremony 

> Sport: Consists of the sports name 

> Gold Medals: No. of Gold medals 

> Silver Medals: No.of Silver medals 

> Bronze Medals: No.of Bronze medals 

i > Total Medals: Consists of total no of medals 
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Dataset Description 


edureka! 


Michael Phelps 

23 

United States 

2008 

08-24-08 Swimming 

8 

0 

0 

8 

Michael Phelps 

19 

United States 

2004 

08-29-04 Swimming 

6 

0 

2 

8 

Michael Phelps 

27 

United States 

2012 

08-12-12 Swimming 

4 

2 

0 

6 

Natalie Coughlin 

25 United 

States 

2008 @8-24-08 Swimming 

1 

2 

3 

Aleksey Nemov 

24 

Russia 2000 

10-01 

-60 Gymnastics 2 

1 

3 

6 


Alicia Coutts 

24 

Australia 

2012 

08-12-12 Swimming 

1 

3 

1 

5 

Missy Franklin 

17 

united States 

2012 

08-12-12 Swimming 

4 

0 

1 

5 

Ryan Lochte 

27 

united states 

2012 

08-12-12 Swimming 

2 

2 

1 

5 

Allison Schmitt 

22 

United States 

2012 

08-12-12 Swimming 

3 

1 

1 

5 

Natalie Coughlin 

21 United 

States 

2004 QB-29-04 Swimming 

2 

2 

1 

Ian Thorpe 

17 

Australia 

2060 

10-01-80 Swimming 

3 

2 

0 

5 

Dara Torres 

33 

united States 

2000 

16-61-00 Swimming 

2 

0 

3 

5 

Cindy Klassen 

26 

Canada 2006 

02-26 

■06 speed Skating 1 

2 

2 

5 


Nastia Liukin 

18 

United States 

200B 

08-24-08 Gymnastics 

1 

3 

1 

5 

Marit Bjorgen 

29 

Norway 2010 

B2-2S 

■IS Cross Country Skiing 

3 

1 

1 

5 

Sun Yang 

20 

China 2Q12 

08-12 

12 Swimming 2 

1 

1 

4 


Kirsty Coventry 

24 

Zimbabwe 

2008 

08-24-08 Swimming 

1 

3 

0 

4 

Libby Lenton-Trickett 

23 Australia 

2008 08-24-08 Swimming 

2 

1 

1 

Ryan Lochte 

24 

United States 

2008 

08-24-38 Swimming 

2 

0 

2 

4 

Inge de Bruijn 

36 

Netherlands 

2004 

08-29-04 Swimming 

l 

1 

2 

4 

Petria Thomas 

28 

Australia 

2004 

08-29-04 Swimming 

3 

1 

0 

4 

Ian Thorpe 

21 

Australia 

2004 

08-29-04 Swimming 

2 

1 

1 

4 

Inge de Bruijn 

27 

Netherlands 

2000 

10-01-00 Swimming 

3 

1 

0 

4 

Gary Hall Jr. 

25 

united states 

200O 

10-01-00 Swimming 

2 

1 

1 

4 

Michael Klim 

23 

Australia 

2000 

10-01-00 Swimming 

2 

2 

0 

4 

Susie O'Neill 

27 

Australia 

2000 

1G-0L-00 Swimming 

1 

3 

0 

4 


Total Medals 
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